Integrating Probabilistic and Knowledge-based Approaches to Corpus Parsing

نویسندگان

  • John Carroll
  • Ted Briscoe
چکیده

We have developed a prototype system for syntactic parsing of corpus text based on a wide-coverage unification-based grammar of English and domain-independent statistical techniques for selecting the most plausible parses from the typically large number licensed by the grammar. Although the results from initial experiments are promising, the system is ‘brittle’, relying particularly on the correctness and completeness of lexical entries. We are currently concentrating on parsing large amounts of tagged text with a relatively simple, but robust, grammar of tag sequences and punctuation. This grammar produces coarse phrasal analyses of sentences from which possible complementation patterns can be extracted, allowing omissions in the set of lexical entries to be remedied. 1 The Probabilistic LR Parsing System Briscoe & Carroll (1993) describe an approach to probabilistic parse selection using a large unification-based grammar of English. The grammar contains approximately 800 phrase structure rules written in the Alvey Natural Language Tools (ANLT) formalism (Briscoe et al. 1987), a syntactic variant of the Definite Clause Grammar formalism (Pereira & Warren 1980). The ANLT grammar has wide coverage and has been shown, for instance, to be capable of assigning a correct analysis to 96.8% of a corpus of 10,000 noun phrases extracted randomly from manually analysed corpora (Taylor, Grover & Briscoe 1989). The grammar is linked to a lexicon containing about 64,000 entries for 40,000 lexemes, including 1This research is supported in part by ESPRIT BRA 7315 ‘The Acquisition of Lexical Knowledge’ (ACQUILEX II).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تأثیر ساخت‌واژه‌ها در تجزیه وابستگی زبان فارسی

Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...

متن کامل

A Probabilistic Model of Learning Fields in Islamic Economics and Finance

In this paper an epistemological model of learning fields of probabilistic events is formalized. It is used to explain resource allocation governed by pervasive complementarities as the sign of unity of knowledge. Such an episteme is induced epistemologically into interacting, integrating and evolutionary variables representing the problem at hand. The end result is the formalization of a p...

متن کامل

Japanese Dependency Parsing Using a Tournament Model

In Japanese dependency parsing, Kudo’s relative preference-based method (Kudo and Matsumoto, 2005) outperforms both deterministic and probabilistic CKY-based parsing methods. In Kudo’s method, for each dependent word (or chunk) a loglinear model estimates relative preference of all other candidate words (or chunks) for being as its head. This cannot be considered in the deterministic parsing me...

متن کامل

Learning Structured Classifiers for Statistical Dependency Parsing

My research is focused on developing machine learning algorithms for inferring dependency parsers from language data. By investigating several approaches I have developed a unifying perspective that allows me to share advances between both probabilistic and non-probabilistic methods. First, I describe a generative technique that uses a strictly lexicalised parsing model, where all the parameter...

متن کامل

The Role of Pragmatics in Solving the Winograd Schema Challenge

Different aspects and approaches to commonsense reasoning have been investigated in order to provide solutions for the Winograd Schema Challenge (WSC). The vast complexities of natural language processing (parsing, assigning word sense, integrating context, pragmatics and world-knowledge, ...) give broad appeal to systems based on statistical analysis of corpora. However, solutions based purely...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017